Building Webcorpora of Academic Prose with BootCaT

نویسنده

  • George Dillon
چکیده

A procedure is described to gather corpora of academic writing from the web using BootCaT. The procedure uses terms distinctive of different registers and disciplines in COCA to locate and gather web pages containing them.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stephen Hawking's Community-Bound Voice A Functional Investigation of Self-Mentions in Stephen Hawking's Scientific Prose

Thanks to the development of the concept of metadiscourse, it is now widely acknowledged that academic/scientific writing is not only concerned with communicating purely propositional meanings: what is communicated through academic/scientific communication is seen to be intertwined with the negotiation of social and interpersonal meanings. While a large number of so called metadiscoursal resour...

متن کامل

GrawlTCQ: Terminology and Corpora Building by Ranking Simultaneously Terms, Queries and Documents using Graph Random Walks

In this paper, we present GrawlTCQ, a new bootstrapping algorithm for building specialized terminology, corpora and queries, based on a graph model. We model links between documents, terms and queries, and use a random walk with restart algorithm to compute relevance propagation. We have evaluated GrawlTCQ on an AFP English corpus of 57,441 news over 10 categories. For corpora building, GrawlTC...

متن کامل

Avoiding Prolixity in Academic Prose; the Use of Quantity Metadiscourse in Research Articles

As part of a wider attempt to bestow the spirit of scholarly prose upon the research articles’ rhetorical structure, academic writers invariably take advantage of quantity metadiscourse markers to avoid prolixity and live up to the implicit and explicit maxims of quantity category as suggested in Gricean CP and similar models.  In order to develop a clear understanding of quantity strategies di...

متن کامل

Plenty of Fish in the Academy: On Marshall McLuhan’s Prose as an Anti-Environment

The purpose of this synthesis is to deconstruct the medium of Marshall McLuhan’s prose as an anti-environment for the medium of traditional academic writing. By placing McLuhan’s own theory in dialogue with the founding principles of linguistic anthropology, I will argue that McLuhan’s authorial tactics—a subject of his long-term repudiation by the academic community on the whole—adhered to the...

متن کامل

Retrieving Japanese specialized terms and corpora from the World Wide Web

The BootCaT toolkit (Baroni and Bernardini, 2004) is a suite of perl programs implementing a procedure to bootstrap specialized corpora and terms from the web using minimal knowledge sources. In this paper, we report ongoing work in which we apply the BootCaT procedure to a Japanese corpus and term extraction task in the hotel terminology domain. The results of our experiments are very encourag...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010